Introduction

This notebook walks introduces SMCP3, by walking through the development of an SMCP3 algorithm for the inference problem used as the motivating example in this talk on SMCP3. You may find it useful to watch the beginning of the talk for a quick introduction to the problem setting, and to see the visualizations we will produce in this notebook.

Sometimes when running this notebook I found that Jupyter gave error messages relating to displaying the animations run in this notebook. I was able to solve this problem by restarting Jupyter, with the following flag:

jupyter notebook --NotebookApp.iopub_msg_rate_limit=10e8

Now, let’s dive right in!

using Pkg; Pkg.activate("../..")
  Activating project at `~/Developer/research/SMCP3OpenSource/GenSMCP3.jl`
# Include this during development so that changes in the static code files
# are loaded immediately by the Julia instance.
using Revise

Probabilistic Model

We consider a probabilistic model in which an object performs a random walk in \(\mathbb{R}^D\). That is, we set \(z_0 = \vec{0}\), and for every \(t > 0\), we have \(z_t \sim \mathcal{N}(z_{t-1}, \sigma_z I_D)\), where \(I_D\) is the \(D\)-dimensional identity matrix.

Under this probabilistic model, we assume that at every timestep, we receive a noisy observation of the object’s position, \(y_t\). Specifically, \(y_t \sim \mathcal{N}(z_t, \sigma_y I_D)\).

The goal of probabilistic inference in this model will be: given a stream of noisy observations, \(y_{1:T} = (y_1, y_2, \dots, y_T)\), understand what latent trajectory the underlying object may have moved along, to produce those observations. Specifically, we will characterize the posterior distribution \(P(z_{1:t} = \cdot | y_{1:t})\).

(In this particular model, inference can be done exactly using a Kalman Filter. In this notebook, we will develop an approximate inference algorithm using SMCP3, which can be used to perform inference in this model, and a wide range of models with differentiable likelihood functions – including models which are not differentiable. While SMCP3 is not needed for this particular toy random-walk model, since Kalman Filtering already suffices, we use this model in our tutorial because it is very simple.)

Probabilistic Program implementation of the Model

Below, we define this probabilistic model as a Gen probabilistic program. First, we define global variables for \(\sigma_x\) and \(\sigma_y\).

# Global variables σ_z and σ_y controlling the random walk speed and the observation noise
Z_STD() = 1.
Y_STD() = 1.
Y_STD (generic function with 1 method)

Then, we define a Gen probabilistic program specifying what new values become instantiated by the model at each timestep.

using Gen
@gen (static, diffs) function step_model(t, zₜ₋₁)::Float64
    # We use Gen's `broadcasted_normal` distribution.
    # If μ is a D-dimensional vector, then `broadcasted_normal(μ, σ)`
    # is the Gaussian distribution with mean μ and covariance matrix σ*I,
    # where I is the D-dimensional identity matrix.

    # The latent state evolves according to a random walk.
    zₜ ~ broadcasted_normal(zₜ₋₁, Z_STD())
    
    # A noisy measurement of the latent state is observed.
    yₜ ~ broadcasted_normal(zₜ, Y_STD())
    
    # Output the new latent position, to be fed in at the next
    # timestep as the value zₜ₋₁.
    return zₜ
end
var"##StaticGenFunction_step_model#329"(Dict{Symbol, Any}(), Dict{Symbol, Any}())

Then, we define a Gen probabilistic program which simulates from this step model for T timesteps. To do this, we use Gen’s Unfold combinator.

@gen (static, diffs) function model(D, T)
    z₀ = zeros(D) # D-dimensional vector of zeros
    steps ~ Unfold(step_model)(T, z₀)
    return steps
end
var"##StaticGenFunction_model#341"(Dict{Symbol, Any}(), Dict{Symbol, Any}())

For performance, we wrote the above probabilistic programs using Gen’s Static probababilistic program writing DSL, which can compile the probabilistic programs to speed up probabilistic inference. We must call the following macro so Gen knows these probabilistic programs are ready to be compiled when they are needed.

@load_generated_functions()

Visualizing latent states generated from the model

Now, let’s generate a random latent trace from this probabilistic model. We’ll set \(D=2\), so we can visualize the object as moving around in the plane.

## Generate a random trace from the model
trace_1 = Gen.simulate(
    model,
    ( # Arguments to pass into the probabilistic model
        2, # D = 2 (2-dimensional simulation)
        4 # T = 50 (simulate 50 timesteps)
    )
);

This simulation generated a “trace” from the probabilistic model, containing all the \(z_t\) and \(y_t\) values from one simulation of the random walk model. We can have Gen print out the collection of all the random choices stored in this trace:

get_choices(trace_1)
│
└── :steps
    │
    ├── 1
    │   │
    │   ├── :zₜ : [0.42602834835642395, -1.6296511028643599]
    │   │
    │   └── :yₜ : [0.9898224428399893, -2.8982906037536074]
    │
    ├── 2
    │   │
    │   ├── :zₜ : [-0.6509108574990858, -1.1431379512283615]
    │   │
    │   └── :yₜ : [0.3359904091532965, -0.8623329421862657]
    │
    ├── 3
    │   │
    │   ├── :zₜ : [-0.07516319780998115, -1.9946561099455804]
    │   │
    │   └── :yₜ : [-0.9674520819950085, -2.5506208955840632]
    │
    └── 4
        │
        ├── :zₜ : [0.27923589356757517, -1.8163051932449021]
        │
        └── :yₜ : [1.1333993244926113, -0.8451504948226879]

We can get any specific value by indexing into the trace with the proper address:

trace_1[:steps => 2 => :zₜ]
2-element Vector{Float64}:
 -0.6509108574990858
 -1.1431379512283615

We’ll also define the following helper function:

function get_x(trace, t)
    if t == 0
        D = get_args(trace)[1]
        return zeros(D)
    else
        return trace[:steps => t => :zₜ]
    end
end
get_x (generic function with 1 method)

Now, let’s generate a trace with 50 timesteps.

trace_50 = Gen.simulate(model, (2, 50));
observations = [trace_50[:steps => t => :yₜ] for t=1:50]
observations
50-element Vector{Vector{Float64}}:
 [1.075781171993675, 0.5225368039979145]
 [0.30648408399112825, 2.6919531435646293]
 [3.589285043378361, 4.260725247584654]
 [2.675026674859555, 4.967662431016679]
 [1.6583667978660914, 4.296741160531252]
 [1.1718302690537648, 5.289505776726546]
 [4.505891059931557, 7.168509410275103]
 [4.7990351702707414, 9.124818585741401]
 [2.021747269236077, 10.049824043142813]
 [-0.8067310229026994, 8.983241331957604]
 [0.3177925348836881, 10.441901258947876]
 [-2.3241175562292202, 12.122446151183185]
 [-2.5155520413920485, 12.788072839912545]
 ⋮
 [-7.692543268353392, 26.142249424902793]
 [-8.80023713015943, 26.140885325106478]
 [-9.88698620514221, 30.124997156565595]
 [-12.148610248957382, 30.943176843558824]
 [-10.947012446979949, 28.999770034910725]
 [-9.693094762078506, 27.27337720641297]
 [-9.099415725898062, 30.22160659552356]
 [-6.819114115075519, 30.35073725430037]
 [-10.267834547688844, 30.080865234026678]
 [-11.258965660009597, 27.675840947941765]
 [-8.304412583048862, 27.672659838267695]
 [-6.45955901483878, 26.25244062975913]

And let’s visualize what the latent trajectory and observations from this trace looked like.

# Include the visualization code
includet("../utils/visualize/visualize.jl")
# Global variable: visualize traces on an axis with the X and Y axes bounded between -15 and 25.
VIZ_DOMAIN() = ((-15, 25), (-15, 25))
VIZ_DOMAIN (generic function with 1 method)
(figure_1, animated_time_1) = plot_trajectory_and_observations(
    VIZ_DOMAIN(),
    t -> observations[t],
    t -> trace_50[:steps => t => :zₜ]
);
figure_1
# Animate the above visual, from time 1 to time 50, at 10 FPS
@async animate!(animated_time_1, 50, framerate=10)
Task (done) @0x0000000296d8c8b0

We can also visualize the latent object’s trajectory on its own, without the observed data:

# Plot just the latent trajectories
(figure_trajectory, time_trajectoryfig) = plot_trajectory(
    VIZ_DOMAIN(),
    t -> trace_50[:steps => t => :zₜ]
);
figure_trajectory
@async animate!(time_trajectoryfig, 50, framerate=10)
Task (done) @0x00000002944b3260

Likewise, we can visualize the observations without the latent trajectory:

## Plot just the observations
(f_obs, t_obs) = plot_obs(
    t -> (observations[t],),
    VIZ_DOMAIN(),
    n_backtrack=4
);
f_obs
@async animate!(t_obs, 50, framerate=10)
Task (done) @0x0000000297b28180

When you see the above set of observations, can you infer what the latent trajectory looks like?

Probabilistic Inference

Now we will write a sequential Monte Carlo algorithm to solve the problem of inferring a latent trajectory, given the noisy observations of the object.

A challenging observation sequence to run inference on

First, we will load up a sequence of observed values I generated offline, which we will try to run inference on.

challenging_observations = include("challenging_observations.jl")
51-element Vector{Vector{Float64}}:
 [0.5803564054939987, 0.4721036401961641]
 [2.3582240250526056, 1.8776227764167297]
 [3.8242045644333027, 3.0494060340858513]
 [4.926177421743004, 5.769120654431916]
 [6.537776525431695, 7.238324648540233]
 [8.614623308468717, 6.919109663193253]
 [9.684150229157632, 6.681534719065196]
 [12.973110928384948, 8.539501275927721]
 [16.23817326259173, 11.296762069793202]
 [18.828736671812873, 12.89932246230315]
 [22.58017851689378, 13.978437325392978]
 [21.43522045704328, 11.691843037161787]
 [21.221920561916022, 9.402747051628198]
 ⋮
 [2.5474335987648966, -10.21042432468431]
 [1.213798439716259, -10.875293970033454]
 [1.9515398657232808, -10.995749772196021]
 [3.048764304629288, -9.439122832792158]
 [5.89415930800625, -10.485610701041784]
 [4.559725983714342, -8.913428897776626]
 [4.076567284106472, -6.388742515633303]
 [4.189009193744198, -5.227214817273897]
 [5.22396378118641, -3.4505244637433585]
 [4.320535629891629, 0.6069459835181732]
 [2.9497944548470034, 5.206137504735176]
 [2.4527707523093283, 5.883098325213314]
## Plot just the observations
(f_hardobs, t_hardobs) = plot_obs(
    t -> (challenging_observations[t],),
    VIZ_DOMAIN(),
    n_backtrack=4
);
f_hardobs
@async animate!(t_hardobs, 50, framerate=10)
Task (done) @0x00000002944f42f0

Can you mentally “run inference”, and get a sense what the latent trajectory underlying this sequence of observations probably looks like?

Sequential Monte Carlo Inference (aka SMC, or “particle filtering”)

Now, we will write a function which accepts a vector of observed values as input, and runs probabilistic inference via a particle filter.

We will first write a generic particle filtering function. This will include a key degree of freedom: the rule used to update latent hypotheses (particles) in light of new data. Then, we will experiment with the particle filter which results from two different rules for updating particles. First, we will experiment with a simple baseline strategy: the “bootstrap particle update”. Then, we will write and experiment with a smarter strategy for updating particles, implemented via SMCP3.

using GenParticleFilters # Import Gen's particle filtering library

function particle_filter(
    # The vector of observations over which to run inference
    observations,

    # The number of particles to use in SMC.
    # At each timestep, the SMC algorithm will instantiate `n_particles`
    # hypotheses about the latent trajectory which may underlie the observed data.
    n_particles,
        
    # This will be a function which, given a new observation yₜ, outputs
    # a rule for updating latent particles in light of that new observation yₜ.
    # (It can be any Julia object which the `GenParticleFilters.pf_update!` method
    # knows how to interpret as a particle updating rule -- including an SMCP3Update.)
    observation_to_particle_updater;
        
    # In a particle filter, at some timesteps, "resampling" is performed.
    # This duplicates particles (latent hypotheses) which appear likely given the data,
    # and deletes particles which appear unlikely.
    # In this algorithm, we will resample whenever the "effective sample size"
    # of the particle filter falls below `n_particles * ess_threshold_fraction`.
    ess_threshold_fraction=1/5,
        
    # Should SMC just output the final inferences from time `T`?
    # Or should it output what it had inferred at each time t < T?
    output_intermediate_inferences=false,
)
    # The dimensionality of the model is the dimensionality
    # of each observed position.
    D = length(observations[1])
    
    # Initialize the particle filter.
    # This is done by initializing a mutable `state` object which
    # will contain `n_particles` traces from the target probabilistic model,
    # each associated with a real number, its "particle weight".
    # Each trace is a "particle" representing a possible latent trajectory under the model,
    # and each particle's weight characterizes how promising it is relative to the
    # other particles.
    state = pf_initialize(
        model,
        (
            D, # Argument 1 to the model: The dimensionality.
            0, # Argument 2 to the model: The number of timesteps.  We initialize at time 0.
        ),
        
        # Observed choices in the model, at the initial timestep.
        # In this model, there are no observations at time 0,
        # so we pass in an empty choicemap.
        choicemap(),
        
        # The number of particles to use.
        n_particles
    )
    
    if output_intermediate_inferences
        intermediate_inferences = [deepcopy(state)]
    else
        intermediate_inferences = nothing
    end
    
    # For each remaining timestep...
    for t in 1:length(observations)
        new_observation = observations[t]
        
        # Update the particles in the particle filter, in light of the new data point.
        pf_update!(
            state,
            
            # At this new timestep, the arguments to the model are:
            # dimensionality = D (same as before), and
            # time = t (previously, it was t-1).
            (D, t), 
            
            # For performance, tell Gen: this update doesn't change D (but it does change t)
            (NoChange(), UnknownChange()),

            # Tell `pf_updater!` what rule to use to update the particles in light of new data.
            observation_to_particle_updater(new_observation)...,

            # Choicemap of new observations
            choicemap((:steps => t => :yₜ, new_observation))
        )
        
        if output_intermediate_inferences
            push!(intermediate_inferences, deepcopy(state))
        end
        
        # Below I will define the method which performs resampling.
        # Understanding this is not crucial to this tutorial.
        # The upshot is that this step will duplicate particles
        # which seem promising, given the observed data,
        # and discard particles which seem unlikely, given the data.
        maybe_perform_resampling!(state, ess_threshold_fraction)
    end
    
    # Output the final inference results as of time `T`,
    # and the inference results the particle filter had obtained
    # at each timestep.
    return (state, intermediate_inferences)
end
particle_filter (generic function with 1 method)
function maybe_perform_resampling!(smc_state, ess_threshold_fraction)
    n_particles = length(smc_state.traces)
    
    # In this particle filtering method, we will run resampling
    # independently among 5 different groups of particles.
    groupsize = Int(floor(n_particles / 5))
    for i in 1:5
        initial_particle_index = 1 + (i - 1) * groupsize
        final_particle_index = initial_particle_index + groupsize - 1

        # Get the SMC state corresponding just to this sub-collection of particles.
        sub_smc_state = smc_state[initial_particle_index:final_particle_index]
        
        # Resample if the effective sample size for this substate is sufficiently low.
        if effective_sample_size(sub_smc_state) < ess_threshold_fraction * floor(n_particles / 5)
            pf_resample!(sub_smc_state, :residual)
        end
    end
end
maybe_perform_resampling! (generic function with 1 method)

The bootstrap particle update

First, we’ll implement a particle-updating strategy called the bootstrap particle update.

The bootstrap particle update accepts as input a particle \(z_{1:t-1}\), which is a trajectory over \(t-1\) time steps. It also receives the new observation \(y_t\) as input. It then samples a value \(z_t\) from \(P(z_t = \cdot | z_{t-1})\), and extends the trajectory with it. The new trajectory \(z_{1:t}\) is used as the updated particle.

Note that this is not a very intelligent particle-updating strategy, becuase it ignores the information in the observed value \(y_t\).

Telling Gen to run the bootstrap update, via a 1-liner

Since this particle updating strategy only requires knowledge of the model \(P(z_t = \cdot | z_{t-1})\), which was implicitly defined when we wrote the probabilistic model program, Gen can automatically run SMC with the bootstrap particle update. We simply tell the particle_filter function to use the default particle-updating behavior – which in Gen is the bootstrap update – by passing it an empty tuple () as the particle updating strategy.

function bootstrap_particle_filter(observations, n_particles; kwargs...)
     return particle_filter(
        observations, n_particles,

        # To specify to Gen to use the bootstrap particle updating strategy, we simply use the empty tuple
        # as the particle updating rule.
        # This tells `GenParticleFilters.pf_update!` to use its default particle-updating behavior,
        # which is the bootstrap particle update.
        y -> ();
        
        kwargs...
    )
end
bootstrap_particle_filter (generic function with 1 method)

Now, we can run inference!

(final_bootstrap_inferences, bootstrap_inference_results) = bootstrap_particle_filter(
    challenging_observations,
    30;
    output_intermediate_inferences=true
);

Now, we’ll visualize the results of inference. First, we define a function, state_to_weights_and_trajectories, which extracts the hypothetical latent trajectories, and their particle weights, from the SMC inference result. Then, we will visualize these weighted trajectories over time.

# Define a function which accepts an inference state as input,
# and outputs a weighted particle collection from SMC, to visualize.
function state_to_weights_and_trajectories(smc_state)
    n_particles = length(smc_state.traces)
    return [
        (
            # The log of the particle weight
            smc_state.log_weights[i],
            
            # The latent trajectory represented by this particle.
            # (`pf_state.traces[i]` is a particle from SMC.  It is a trace from the probabilistic model.)
            [smc_state.traces[i][:steps => t => :zₜ] for t in 1:get_args(smc_state.traces[i])[2]]
        )
        for i in 1:n_particles
    ]
end
state_to_weights_and_trajectories (generic function with 1 method)
f_bootstrap, t_bootstrap = plot_observation_and_particles(
    VIZ_DOMAIN(),
    t -> challenging_observations[t], # observations
    
    # At time `T`, visualize the inferred trajectories as of time T.
    # (Index at T + 1, because the initial timestep is T=0, and Julia is 1-indexed.)
    T -> state_to_weights_and_trajectories(bootstrap_inference_results[T + 1])
)
f_bootstrap
@async animate!(t_bootstrap, 50, framerate=5)
Task (done) @0x000000029bf75880

The results from this inference algorithm sort of tracked the object, but not very well. At some timesteps, the particle filter can be very wrong about where it thinks the object is! For instance, look what happens at time 11.

t_bootstrap[] = 11
11

The issue with this algorithm is that, as noted above, the particle updating strategy is not very intelligent.

Soon, we will write a new particle updating strategy which is much more intelligent than the bootstrap update. This will result in much more successful probabilistic inferences. To write this update, we will need to use SMCP3: SMC with probabilistic program proposals.

Before introducing the new, smarter particle updating strategy, to introduce SMCP3, let’s walk through how we would implement the bootstrap particle update we saw above, using SMCP3.

Re-implementing the bootstrap particle update, using SMCP3

Above, we utilized the fact that Gen runs the boostrap update as its default particle-updating behavior. But if this weren’t a default in Gen, we could still implement the bootstrap update, by writing a probabilistic program proposal which defines the bootstrap particle update for this model. To introduce some of the key ideas of SMC, we will do that below.

We will write this particle updating strategy as a probabilistic program in a new DSL, triggered using the @kernel macro.

This probabilistic program proposal will accept a trace prev_trace from time \(t-1\) as input. prev_trace contains the trajectory \(z_{1:t-1}\) in it (as well as all the observations \(y_{1:t-1}\). The proposal will also receive the value \(y_t\) as input.

The proposal will output an update specification which tells Gen how to update prev_trace in light of \(y_t\). Specifically, the update specification will be a Gen.ChoiceMap object which tells Gen what value to use for every new or changing latent value in the updated trace. In this case, it needs to set a value for \(z_t\), so the choicemap will specify a value at address :steps => t => zₜ.

For reasons we’ll see later, for Gen to be able to use this proposal distribution in an SMCP3 algorithm, the proposal needs to output one other value, which in this example is simply an empty choice map.

using GenSMCP3: @kernel
@kernel function bootstrap_proposal(prev_trace, new_observation)
    t = get_args(prev_trace)[2] + 1
    zₜ₋₁ = get_x(prev_trace, t - 1)
    
    # Sample a new value from P(zₜ = . | zₜ₋₁), which
    # 
    new_zₜ ~ broadcasted_normal(zₜ₋₁, Z_STD())
    
    proposed_latent_values_for_updated_trace = choicemap((:steps => t => :zₜ, new_zₜ))
    return (
        # Update specification choicemap.
        proposed_latent_values_for_updated_trace,
        
        # The other choicemap -- which will be explained later
        choicemap()
    )
end
GenTraceKernelDSL.Kernel(var"#390#391"())

A note on how Gen updates prev_trace. We will ultimately use this proposal distribution in the particle_filter function defined above. Specifically, our call to pf_update! will call this proposal distribution to update the SMC particles. The speficic way that prev_trace will be updated is as follows. First, the proposal distribution above will be run, and it will return proposed_latent_values_for_updated_trace. Then, the pf_update! will update prev_trace into a new trace by running:

updated_trace, _ = Gen.update(
    # The trace to update
    prev_trace,
    
    # The new arguments to the trace, after the update.
    # (D stayed the same; t increased by one.)
    (D, t,), 
    
    # A choicemap specifying the value to set every new or changed latent
    # value to, and also the value for every new observed value in the trace.
    Gen.merge(
        proposed_latent_values_for_updated_trace,
        observed_value_choicemap
    )

In this example, the observed_value_choicemap will be choicemap((:steps => t => :yₜ, yₜ)) where yₜ is the newly observed value. In general, it will be a choicemap specifying all the new observed values in the probabilistic model, at this timestep. (In the definition of particle_filter, you can notice that we give pf_update! the observed value choicemap as input, so it can use it for this update.)

That is: Gen will update the trace by 1. Fhanging the argument to the model probabilistic program. 2. Filling in the newly observed values in the updated trace with the actual observations. 3. Filling in new latent values (or overwriting existing latent values) in the trace with the values in the choicemap output by the probabilistic program proposal. (We called this choicemap proposed_latent_values_for_updated_trace.)

The backward proposal distribution

A digression on why particles must have particle weights in SMC. To run SMC, it is not enough to be able to update a particle from time \(t-1\) into a particle at time \(t\). The issue is that we ultimately want our particles – in this case, hypothetical latent trajectories \(z_{1:t}^i\) (where \(i\) is an index specifying one of the many hypothesized particles) – to tell us something about the posterior \(P(z_{1:t} = \cdot | y_{1:t})\). But if we generate \(z_{1:t}\) by sequentially running a particle updating probabilistic program, which may bear no relation to this posterior distribution, then the actual hypothetical latent trajectories SMC will generate could be totally unrelated to \(P(z_{1:t} = \cdot | y_{1:t})\)! In order to make it so that the particles generated by SMC are actually informative about this posterior, SMC also computes particle weights. For each particle \(z_{1:t}^i\), the particle weight is a number \(w_t^i\) which characterizes how different the distribution actually used to generate \(z_{1:t}^i\) is from the posterior distribution we want to target with inference. It turns out that if we can compute such particle weights, SMC can correct for any biases in the sampling process.

How does SMC compute weights? The question then becomes: if we give SMC an arbitrary probabilistic program proposal, like the above, how can SMC know how to compute valid weights for the particles, which correctly account for the difference between the target posterior distribution and the particle-generating distribution?

The backward proposal distribution. It turns out that this problem of computing particle weights can be reduced to a different problem: the problem of writing backward proposal distributions. A backward proposal accepts an updated particle (in this model, \(z_{1:t}\)) as input, and it outputs two things. First, it proposes: what was the particle that may have been input to the “forward” proposal distribution, given that it output \(z_{1:t}\). (In this case, it is guaranteed to have been \(z_{1:t-1}\).) Second, it proposes: what random choices may the forward proposal have made, to update this old particle into the updated particle? (In this case, the choice would have had to be the value \(z_t\).)

Given a “forward” proposal distribution which updates a particle from time \(t-1\) into a particle from time \(t\) (and outputs the one other value we will return to later), and a “backward” proposal which inverts this, the SMCP3 algorithm can automatically compute particle weights, and can thus run sequential Monte Carlo.

As noted two paragraphs ago, in the bootstrap update, there is only one valid backward proposal. Let’s write it now as a probabilistic program.

@kernel function backward_proposal_for_bootstrap_update(updated_particle, observation_from_time_t)
    t = get_args(updated_particle)[2]
    
    return (
        # The first return value needs to specify how to update the `updated_particle` into a
        # particle from time t-1.  Specifically, we need to remove the values zₜ and yₜ from the trace.
        # This removal will automatically happen when Gen changes the argument to the probabilistic model
        # from t to t-1, so we do not need to specify any constraints on the latent values in the
        # updated trace.
        choicemap(),
        
        # The second return value must propose a value for every choice
        # the forward proposal would make during the update from the previous
        # trace specified by the first return value of this proposal.
        # In this case, we need to specify that the forward proposal
        # would have had to sample the actualy zₜ value in `updated_particle`
        # in order to produce `updated_particle` as its proposed updated trace.
        choicemap(
            (:new_zₜ, updated_particle[:steps => t => :zₜ])
        )
     )
end
GenTraceKernelDSL.Kernel(var"#392#393"())

To actually update updated_trace to prev_trace, Gen will run the backward proposal, assign its first output value to a variable update_choicemap_output_by_backward_proposal, and then call:

prev_trace, _ = Gen.update(
    # The trace to update
    updated_trace,
    
    # The new arguments to the trace, after the update.
    # (D stays the same; t is being decreased by one.)
    (D, t - 1,), 
    
    # A choicemap specifying the value with which to overwrite
    # any choices which may have changed between the previous
    # and new trace, and the value for any choices which
    # existed at time t-1 but not time t.
    update_choicemap_output_by_backward_proposal
)

While running SMCP3, this update never actually needs to occur, since the backward proposal is only used for computing particle weights.

Running the bootstrap update via SMCP3

With these proposals defined, we can tell the particle_filter function to run the bootstrap particle filter, using an SMCP3Update object as the speficiation for the particle-updating strategy.

using GenSMCP3: SMCP3Update

function bootstrap_particle_filter_via_SMCP3(observations, n_particles; kwargs...)
     return particle_filter(
        observations, n_particles,
        
        # Here we tell the `particle_filter` function to update particles
        # using the SMCP3 move with the forward and backward proposals we
        # defined above.
        # We also tell it that it should give the proposal distributions access
        # to the new observation, `y`, by giving it to the proposal distributions as their second argument.
        # (Due to the way we wrote `particle_filter`, we need to wrap the SMCUpdate object in a tuple.)
        y -> ( 
            SMCP3Update(
                bootstrap_proposal, backward_proposal_for_bootstrap_update,
                (y,), # This speficies the arguments to give the forward proposal, after `prev_trace`.
                (y,) # This speficies the arguments to give the backward proposal, after `updated_trace`.
            ),
        );
        kwargs...
    )
end

Since this is the same algorithm as above, when we run it, we observe the same issues as we did before:

# Run inference
(final_bootstrapSMCP3_inferences, bootstrapSMCP3_inference_results) = bootstrap_particle_filter_via_SMCP3(
    challenging_observations,
    30;
    output_intermediate_inferences=true
);
# Visualize
f_bootstrapSMCP3, t_bootstrapSMCP3 = plot_observation_and_particles(
    VIZ_DOMAIN(),
    t -> challenging_observations[t], # observations
    
    # At time `T`, visualize the inferred trajectories as of time T.
    # (Index at T + 1, because the initial timestep is T=0, and Julia is 1-indexed.)
    T -> state_to_weights_and_trajectories(bootstrapSMCP3_inference_results[T + 1])
)
f_bootstrapSMCP3
t_bootstrapSMCP3[] = 11

Writing a smarter particle update, via SMCP3

The issue with the bootstrap particle update was that it did not take any information from the new observation \(y_t\) into account when proposing the value \(z_t\). Now, let’s write a smarter particle updating strategy which does take this information into account.

Let’s have our particle update take the following steps. 1. As before, let’s sample a value from \(P(z_t = \cdot | z_{t-1})\). But this time, we will not use this value as our final \(z_t\). Instead, we will call this value \(v\), so we will have: \(v \sim P(z_t = \cdot | z_{t-1})\). 2. Then, starting from \(v\), let’s take a step along the gradient of the pdf \(p(y_t, z_t = \cdot | z_{t-1})\). The gradient of this joint PDF does incorporate information from the observation \(y_t\). We’ll call the resulting value \(z'\). Specifically, we’ll have: \(z' = v + \epsilon \nabla_ p(y_t, z_t = v | z_{t-1})\), for some step size \(\epsilon\). 3. Finally, we will apply a small perturbation to \(z'\) to get the final value: \(z_t \sim \mathcal{N}(z', \frac{1}{\sqrt{2} \epsilon}\)).

Steps 2 and 3 constitute an algorithm called “Unadjusted Langevin Ascent”, or ULA. In fact, let’s define steps 2 and 3 as a Gen distribution, using it’s DSL for writing simple distributions.

ULA_STEP_SIZE() = 0.8

function ula_mean(zₜ₋₁, zₜ, yₜ)
    grad = logpdf_grad(broadcasted_normal, zₜ, zₜ₋₁, Z_STD())[1] .+ logpdf_grad(broadcasted_normal, yₜ, zₜ, Y_STD())[2]
    return zₜ + ULA_STEP_SIZE() * grad
end

@dist ULA(zₜ₋₁, zₜ, yₜ) = broadcasted_normal(ula_mean(zₜ₋₁, zₜ, yₜ), sqrt(2 * ULA_STEP_SIZE()))

Now that we have defined the ULA distribution, we can write a probabilistic program proposal implementing the above strategy.

@kernel function smart_forward_proposal(prev_trace, new_observation)
    t = get_args(prev_trace)[2] + 1
    zₜ₋₁ = get_x(prev_trace, t - 1)

    # Step 1: sample v
    v ~ broadcasted_normal(zₜ₋₁, Z_STD())

    # Steps 2 and 3: run ULA
    z ~ ULA(zₜ₋₁, v, new_observation)

    return (
        # Return value 1: a choicemap specifying
        # the new latent value in the updated trace.
        choicemap(
            (:steps => t => :zₜ, z)
        ),
    
        # Return value 2 (to be elaborated on soon):
        # a choicemap specifying the random choices the backward
        # proposal would make to invert this update.
        choicemap(
            (:v_sampled_by_backward_proposal, v)
        )
    )
end

As before, we also need to write a backward proposal to invert the forward proposal. Its job is to propose (1) what the previous particle value was, and (2) what random choices the forward proposal made.

To propose the previous particle value is easy: we just remove \(z_t\) from the updated particle. To propose the random choices made by the forward proposal, we’ll need to propose two values: z and v (the two values sampled in ~ lines by the forward proposal). We know for sure what z was: it was the value \(z_t\) in the updated trace. But given the updated trace, we cannot know deterministically what \(v\) was proposed by the forward proposal. It is possible that any \(v\) was used. (That said, some values for \(v\) are more likely than others, given the updated particle. After all, we know from the updated particle both \(z_{t-1}\) and \(z_t\), both of which are informative about what \(v\) may have been.) As a result, the backward proposal must sample a value for \(v\). The optimal backward proposal, which would result in the lowest-variance incremental particle weights in SMC, would sample \(v\) from the exact posterior distribution over \(v\) given \(z_{t-1}\) and \(z_t\). But in this example, we still get good-enough particle weights if we use the sub-optimal backward proposal for \(v\) which simply samples \(v\) from \(\mathcal{N}(z_{t-1}, \sigma_X I_D)\).

@kernel function backward_proposal_for_smart_forward(updated_trace, new_observation)
    t = get_args(updated_trace)[2]
    zₜ₋₁ = get_x(updated_trace, t - 1)
    
    # We could just call this `v`, but I'm calling it `v_sampled_by_backward_proposal`
    # to highlight that we don't need to sample a value at the same address here
    # as in the forward proposal.  All that matters is that the second choicemap this
    # backward proposal outputs says what the value sampled at address `v` in the forward proposal is.
    v_sampled_by_backward_proposal ~ broadcasted_normal(zₜ₋₁, Z_STD())
    
    return (
        # Choicemap for how to change the latents to update `updated_trace` to
        # a trace at the previous timestep.  (The whole update is managed by decrementing
        # the time argument to the model, so no choices need to be specified.)
        choicemap(),
        
        # Choicemap of all the random choices we are proposing the
        #  forward proposal would make to produce this update.
        choicemap(
            (:v, v_sampled_by_backward_proposal)
        )
    )
end

The second return value of the forward proposal

We have now seen that the forward proposal, \(K\), accepts a particle \(z_{1:t-1}\) from time \(t-1\) as input, and outputs an updated particle \(z_{1:t}\). (More literally, it outputs a speficiation for how to update the old particle to an updated particle.). We’ve seen that the backward proposal accepts a particle \(z_{1:t}\) as input, and outputs a pair \((z_{1:t-1}, u_K)\), where \(z_{1:t-1}\) is the proposed previous particle, and \(u_K\) is a set of choices which could be made by the forward proposal. The last step to complete an SMCP3 algorithm is to have the forward proposal distribution output a second value, \(u_L\), of all the choices which would be made by the backward proposal to invert its update.

Specifically, we have: \[ K : Z^{t-1} \rightsquigarrow Z^t \times U_L \] \[ L : Z^t \rightsquigarrow Z^{t-1} \times U_K \] where \(\rightsquigarrow\) denotes that a mathematical object is a probabilistic program, where \(Z\) is the space each value \(z_t\) lives in (in our case, \(\mathbb{R}^D\)), where \(U_L\) is the space of choicemaps describing the random choices made by the backward proposal, and where \(U_K\) is the space of choicemaps describing the random choices made by the forward proposal. (So in this case, \(U_L\) is the set of all choicemaps with one address, v_sampled_by_backward_proposal, and \(U_K\) is the set of all choicemaps with two addresses, v and z.)

More generally, latent particles don’t need to be spaces of trajectories of the form \(Z^t\) (the space \(Z\) to the power of \(t\)). Latent particles can live in arbitrary spaces. If we let \(X_t\) denote the space of latent states at time \(t\), we have \[ K : X_{t-1} \rightsquigarrow Z^t \times U_L \] \[ L : X_t \rightsquigarrow Z^{t-1} \times U_K \]

Hopefully it is evident how the forward and backward proposals are somewhat symmetrical!

The major constraint on the forward and backward proposal is as follows. If we run \(K\) on \(x_{t-1}\), producing random choices \(u_K\), and ultimately outputting the pair \((x_t, u_L)\) – then it must be the case that if we run \(L\) on \(x_t\), and it makes random choices \(u_L\), then it will output the pair \((x_{t-1}, u_K)\). That is, the value \(u_L\) output by the forward proposal has to correctly specify how the backward proposal would sample \(x_{t-1}\) and propose what choices \(u_K\) it made.

In the above example, the only choice made by the backward proposal is the value \(v\), at address v_sampled_by_backward_proposal. You can check that we indeed have the forward proposal output a choicemap which specifies what the backward proposal would sample at this address, when inverting an update.

Running the smart SMCP3 algorithm

Now, we can run the smart SMCP3 algorithm, and observe how it improves inference quality!

function smart_smcp3(observations, n_particles; kwargs...)
     return particle_filter(
        observations, n_particles,
        y -> ( 
            SMCP3Update(
                smart_forward_proposal, backward_proposal_for_smart_forward,
                (y,), (y,)
            ),
        );
        kwargs...
    )
end
# Run inference
(final_smart_SMCP3_inference, smart_SMCP3_inference_results) = smart_smcp3(
    challenging_observations,
    30;
    output_intermediate_inferences=true
);
# Visualize
f_SmartSMCP3, t_SmartSMCP3 = plot_observation_and_particles(
    VIZ_DOMAIN(),
    t -> challenging_observations[t], # observations
    
    # At time `T`, visualize the inferred trajectories as of time T.
    # (Index at T + 1, because the initial timestep is T=0, and Julia is 1-indexed.)
    T -> state_to_weights_and_trajectories(smart_SMCP3_inference_results[T + 1])
)
f_SmartSMCP3
@async animate!(t_SmartSMCP3, 50, framerate=5)

Unlike the bootstrap particle filter, this algorithm does a great job tracking the object! And if we freeze it at one of the frames the bootstrap filter had a hard time with, we see that this SMCP3 algorithm has no trouble.

t_SmartSMCP3[] = 11

Conclusion

That’s it! After this tutorial hopefully you have a sense how you can use GenSMCP3 to write SMC algorithms using probabilistic program proposals.

See the SMCP3 paper for quantitative results in this model, if you’re interested. (Our code for generating those results is available here.)

This repository also contains our code for the other experiments in the SMCP3 paper. Our intention is to eventually populate the GenSMCP3 repository with cleaned-up implementations of these examples, or potentially other useful tutorial SMCP3 algorithms; if and when we find time for this, we will update this paragraph, and the GenSMCP3 repository README with pointers to these.